Category:

Nate Silver, election forecasters, and the case for meta-meta-analysis.

11/2/2012

For the last few news cycles, Nate Silver (and by extension all election forecasters) faced intense but misguided scrutiny by pundits (especially conservative ones). His critics run the gamut from partisan charlatans to scared pundits clinging steadfastly to the relevance of their raconteur. It's fun to laugh at the innumerate, wishy washy arguments these people make. Yet a rational strain of the debate gets muffled by our collective cachinnation: the argument that Silver's model isn't perfect, simply because no model is likely to be perfect. I'll examine this strain of the debate and argue that we need to systematically compare and combine the results of the numerous election forecasting models that have cropped up in the last decade, and will crop up in the years to come.

This is a manifesto for better election prediction by comparing and contrasting prediction models. That's right. We need to aggregate the aggregators or, if you prefer, meta-analyze the meta-analyses.

The most extreme argument that Silver's forecasts are wrong comes from a model widely cited in the conservative media that predicts a landslide victory for Romney. Political scientists Kenneth Bickers and Michael Berry developed the model. Nate Silver has soundly questioned its merits, although only in a series of Tweets. In his Tweet-ique, Silver cited a model developed by Douglas Hibbs as a more meritorious example that favors Romney.

The critiques of Silver's model don't only come from those whose models forecast a Romney victory. Consistent variance exists among those forecasting an Obama win, as well. Drew Linzer's Votamatic forecast predicts a larger electoral vote victory for Obama than Silver's. Sam Wang's meta-margin and Darryl Holman's electoral vote predictions are consistently between Silver's and Linzer's. These differences have been in the tens of electoral votes, although the models are converging (with the exception of Linzer's) because we are so close to the election.

Similarly, variance exists among modelers in the odds they give for an Obama win. For example, today Wang gives over 99% probability to Obama winning the electoral vote. Holman's gives over 94% probability. Today, Silver gives about 80% probability to an Obama win. Those differences might look small as percents, but in terms of odds, they're big. Wang gives 99 to 1 odds that Obama will win, Holman gives about 16 to 1 odds, and Silver gives about 4 to 1 odds. From a bettor's perspective, those differences are effing huge because the odds are a strictly convex function of the probability of an event. That means the odds grow more and more quickly as an event becomes more probable. If you don't know what the Hell I am talking about, look at this graph of odds as a function of the percent chance of an event. See what I mean?

Anyway, those are just the forecasters I follow. HuffPo and WaPo now have their shiny-looking forecasts, and I'm sure a bunch of other outfits exist that I haven't even heard about. And you know there will be umpteen more forecasting blogs come election 2016, perhaps even by the mid-terms. Each of these forecasts will present a probabilistic picture of an election at a given point in time. They will use varying methods and, as we've already seen, they will likely produce varying results. Browsing each of them obsessively every day will give you a basic sense of how a campaign race is going. Just as well, it might confuse you. Furthermore, your personal biases and unchecked assumptions about what models are more right might cause you to subconsciously weight some models more heavily than others, leading to a conclusion that is about as justified as Wolf Blitzer's or Joe Scarborough's claim that it's a close race.

In the spirit of principled forecasting, subjective and subconscious impressions of the available election forecasts simply cannot stand, lest we instantaneously regress to the (hopefully) waning days of sensationalist campaign punditry. So what is the way forward?

Enter meta-meta-analysis. To understand what that is, you first need to understand what meta-analysis is. Many of the forecasters, like Silver and Wang and Holman and Linzer and Simon Jackman, use information from many polls to guide their models. Each poll is an independent experiment that measure the probability that someone will vote for a particular candidate. Some models, like Silver's, contrast these experiments by assigning a weight to the poll that is proportional to its historical accuracy. All election prediction models (except the ones that rely only on economic indicators to make predictions) also combine information from the separate polls to produce their predictions. An analysis that contrasts and/or combines the results of different studies is a meta-analysis.

But what if some of the analyses that you are combining and contrasting are themselves meta-analyses? That's what would happen if you took my advice and placed the sprawling set of election predictions models under systematic, quantitative scrutiny. I call such a study a meta-meta-analysis. I'm not the only one who calls it so. Recently, a paper in the Canadian Medical Association Journal did a meta-analysis of meta-analyses of medical research and found many of them problematic.

So why meta-meta-analysis? I've already argued that unsystematic review of election forecast models can lead us astray. A more positive argument is that election prediction meta-meta-analysis would provide us with a more complete, and possibly more accurate picture of the current state of a campaign race. The reason is that different election prediction models emphasize different aspects of the race. Some models include variables about the state of the economy. Among those, some models emphasize some variables over others or (in the case of the Berry/Bickers model) include an economic variable that no other models include. Other models take the polls at face value. The relative emphases on economic variables and polls represents one way in which forecasters differ in their approach to prediction.

Statisticians have argued that we must take into account our uncertainty in the best model to use when making predictions. Even if we could measure which of the models is the best one, statisticians argue, we shouldn't necessarily take that best model as the right answer. That would be as silly as always going with the answer of the pollster whose long-term accuracy is highest. While some election models might predict elections better than others, it is likely that none of them are perfectly accurate. It turns out that we might make better predictions by taking a weighted average of the results of the models available to us, with the weights proportional to our relative certainty that a given model is correct. If you want to geek out on this concept, read this excellent interview that Prashant Nair at PNAS did of statistician Adrian Raftery, who has applied model averaging methods to topics ranging from weather prediction to demographic forecasting.

I'm going out on a limb here to say that I'm a pioneer in the model averaging of election forecasting models. My meta-meta-analyses (which you can read here and here in reverse chronological order) are simple - almost stupidly simple. They combine only two models. They assume that the two models have equal predictive power in the absence of any evidence to the contrary. But they provide a first glimpse at the potential power of election prediction meta-meta-analysis, and present a simple method to average the electoral vote predictions of forecasters.

Where do we go from here? First, we need to compare more than two models. Yet election forecasters, particularly those who are licensed by major media companies, are likely wary of sharing their raw data for fear of being scooped (not so with at least two open source modelers, Sam Wang and Darryl Holman, who provide their raw electoral vote probability distributions for each forecast update). Second, we need to come up with a method to estimate our uncertainty in the predictive power of a given model (until then, I suggest we place equal weight on each model). Doing so, we can begin to make informed decisions about how to weight models' results when we average them. We would also have some measure of which models are "better". Third, we need to develop methods to effectively communicate the results of these meta-meta-analyses to the public so that people can understand what it all means (at least if they're not frothy-mouthed partisan buffoons or spotlight-grubbing dead-heat-arguing pundits).

So there you have it. A manifesto for meta-meta-analysis of election prediction models. I hope this message reaches election prediction modelers and convinces them that it is worthwhile to compromise our competing interests and Internet market share in the interest of making better predictions that result in a more informed public. I welcome your comments and criticism. Even if you'd never participate in this effort, election forecasters, at least give me your two cents.

Lastly, thanks to all the election forecasters for the important work that they do.

0 Comments

Toward meta-meta-analysis of election polls

10/25/2012

0 Comments

What do Nate Silver, Darryl Holman, Drew Linzer, and Sam Wang all have in common? They all use statistical methods to forecast elections, especially presidential ones. Their models all tend to say the same thing: the odds are pretty good that Obama is going to win. Yet they often make different predictions about the number of electoral votes that, say, Obama will get, and about the probability that Obama would win if an election were held right now.

For example, as of right now, Silver predicts 294 electoral votes to Obama with 3 to 1 odds of an Obama win. Holman predicts an average 299 electoral votes with 9 to 1 odds of an Obama win. Wang predicts a median 291 electoral votes, also with 9 to 1 odds of an Obama win. Linzer predicts a whopping 332 electoral votes and doesn't report the probability of an Obama win.

I contacted each of those men to request access to their electoral vote probability distributions. So far, Sam Wang and Darryl Holman have accepted. Drew Linzer declined. Nate Silver hasn't answered, likely because his mailbox is chock full of fan and hate mail.

Wang and Holman now both offer their histogram of electoral vote probabilities on their respective web pages. I went and grabbed these discrete probability distributions and did what a good, albeit naive model averager would do: I averaged the probability distributions to come up with a summary probability distribution (which, by the way, still sums to one).

This method makes sense because, basically, these guys are estimating 538 parameters, and I'm simply averaging those 538 parameters across the models to which I currently have access because I currently have no reason to think they are much different in predictive power (although later on the method could be extended to include weights).

From the aggregated electoral vote distribution, I calculated the mean, median, 2.5th percentile, and 97.5th percentile of the number of electoral votes (EV) to Obama. I also calculated the probability that Obama will get 270 EV or more, winning him the election.

Mean EV: 296
Median EV: 294
95% Confidence interval: 261, 337
Probability Obama wins: over 90%

So 9 to 1 odds Obama wins. Something like 294 or 296 electoral votes.

I'd love to see what happens if I put Nate Silver into the equation. Obviously, it will drag the distribution down. I might look into modeling weights at that point, too, because both Holman and Wang predicted the electoral votes better than Silver, and I believe Wang did a slightly better job than Holman, although I forget.

Anyway, there you have it. Rest easy and VOTE.

0 Comments

about

Malark-O-blog published news and commentary about the statistical analysis of the comparative truthfulness of the 2012 presidential and vice presidential candidates. It has since closed down while its author makes bigger plans.

author

Brash Equilibrium is an evolutionary anthropologist and writer. His real name is Benjamin Chabot-Hanowell. His wife calls him Babe. His daughter calls him Papa.

what is malarkey?

It's a polite word for bullshit. Here, it's a measure of falsehood. 0 means you're truthful on average. 100 means you're 100% full of malarkey. Details.

what is simulated malarkey?

Fact checkers only rate a small sample of the statements that politicians make. How uncertain are we about the real truthfulness of politicians? To find out, treat fact checker report cards like an experiment, and use random number generators to repeat that experiment a lot of times to see all the possible outcomes. Details.

malark-O-glimpse

Can you tell the difference between the 2012 presidential election tickets from just a glimpse at their simulated malarkey score distributions?

dark = pres, light = vp
(Click for larger image.)

fuzzy portraits of malarkey

Simulated distributions of malarkey for each 2012 presidential candidate with 95% confidence interval on either side of the simulated average malarkey score. White line at half truthful. (Rounded to nearest whole number.)

(Click for larger image.)

87% certain Obama is less than half full of malarkey.
100% certain Romney is more than half full of malarkey.
66% certain Biden is more than half full of malarkey.
70% certain Ryan is more than half full of malarkey.

(Probabilities rounded to nearest percent.)

fuzzy portraits of ticket malarkey

Simulated distributions of collated and average malarkey for each 2012 presidential election ticket, with 95% confidence interval labeled on either side of the simulated malarkey score. White line at half truthful. (Rounded to nearest whole number.)

malarkometer fuzzy ticket portraits 2012-10-16 2012 election

(Click for larger image.)

81% certain Obama/Biden's collective statements are less than half full of malarkey.
100% certain Romney/Ryan's collective statements are more than half full of malarkey.
51% certain the Democratic candidates are less than half full of malarkey.
97% certain the Republican candidates are on average more than half full of malarkey.
95% certain the candidates' statements are on average more than half full of malarkey.
93% certain the candidates themselves are on average more than half full of malarkey.

(Probabilities rounded to nearest percent.)

Comparisons

Simulated probability distributions of the difference the malarkey scores of one 2012 presidential candidate or party and another, with 95% confidence interval labeled on either side of simulated mean malarkey. Blue bars are when Democrats spew more malarkey, red when Republicans do. White line and purple bar at equal malarkey. (Rounded to nearest hundredth.)

(Click for larger image.)

100% certain Romney spews more malarkey than Obama.
55% certain Ryan spews more malarkey than Biden.
100% certain Romney/Ryan collectively spew more malarkey than Obama/Biden.
94% certain the Republican candidates spew more malarkey on average than the Democratic candidates.

(Probabilities rounded to nearest percent.)

2012 prez debates

presidential debates

Simulated probability distribution of the malarkey spewed by individual 2012 presidential candidates during debates, with 95% confidence interval labeled on either side of simulated mean malarkey. White line at half truthful. (Rounded to nearest whole number.)

(Click for larger image.)

66% certain Obama was more than half full of malarkey during the 1st debate.
81% certain Obama was less than half full of malarkey during the 2nd debate.
60% certain Obama was less than half full of malarkey during the 3rd debate.

(Probabilities rounded to nearest percent.)

(Click for larger image.)

78% certain Romney was more than half full of malarkey during the 1st debate.
80% certain Romney was less than half full of malarkey during the 2nd debate.
66% certain Romney was more than half full of malarkey during the 3rd debate.

(Probabilities rounded to nearest percent.)

aggregate 2012 prez debate

Distributions of malarkey for collated 2012 presidential debate report cards and the average presidential debate malarkey score.

(Click for larger image.)

68% certain Obama's collective debate statements were less than half full of malarkey.
68% certain Obama was less than half full of malarkey during the average debate.
67% certain Romney's collective debate statements were more than half full of malarkey.
57% certain Romney was more than half full of malarkey during the average debate.

(Probabilities rounded to nearest percent.)

2012 vice presidential debate

(Click for larger image.)

60% certain Biden was less than half full of malarkey during the vice presidential debate.
89% certain Ryan was more than half full of malarkey during the vice presidential debate.

(Probabilities rounded to nearest percent.)

overall 2012 debate performance

Malarkey score from collated report card comprising all debates, and malarkey score averaged over candidates on each party's ticket.

(Click for larger image.)

72% certain Obama/Biden's collective statements during the debates were less than half full of malarkey.
67% certain the average Democratic ticket member was less than half full of malarkey during the debates.
87% certain Romney/Ryan's collective statements during the debates were more than half full of malarkey.
88% certain the average Republican ticket member was more than half full of malarkey during the debates.

(Probabilities rounded to nearest percent.)

2012 debate self comparisons

Simulated probability distributions of the difference in malarkey that a 2012 presidential candidate spews normally compared to how much they spewed during a debate (or aggregate debate), with 95% confidence interval labeled on either side of the simulated mean difference. Light bars mean less malarkey was spewed during the debate than usual. Dark bars less. White bar at equal malarkey. (Rounded to nearest hundredth.)

individual 2012 presidential debates

(Click for larger image.)

80% certain Obama spewed more malarkey during the 1st debate than he usually does.
84% certain Obama spewed less malarkey during the 2nd debate than he usually does.
52% certain Obama spewed more malarkey during the 3rd debate than he usually does.

(Click for larger image.)

51% certain Romney spewed more malarkey during the 1st debate than he usually does.
98% certain Romney spewed less malarkey during the 2nd debate than he usually does.
68% certain Romney spewed less malarkey during the 3rd debate than he usually does.

(Probabilities rounded to nearest percent.)

aggregate 2012 presidential debate

(Click for larger image.)

58% certain Obama's statements during the debates were more full of malarkey than they usually are.
56% certain Obama spewed more malarkey than he usually does during the average debate.
73% certain Romney's statements during the debates were less full of malarkey than they usually are.
86% certain Romney spewed less malarkey than he usually does during the average debate.

(Probabilities rounded to nearest percent.)

vice presidential debate

(Click for larger image.)

70% certain Biden spewed less malarkey during the vice presidential debate than he usually does.
86% certain Ryan spewed more malarkey during the vice presdiential debate than he usually does.

(Probabilities rounded to nearest percent.)

2012 opponent comparisons

Simulated probability distributions of the difference in malarkey between the Republican candidate and the Democratic candidate during a debate, with 95% confidence interval labeled on either side of simulated mean comparison. Blue bars are when Democrats spew more malarkey, red when Republicans do. White bar at equal malarkey. (Rounded to nearest hundredth.)

individual 2012 presidential debates

(Click for larger image.)

60% certain Romney spewed more malarkey during the 1st debate than Obama.
49% certain Romney spewed more malarkey during the 2nd debate than Obama.
72% certain Romney spewed more malarkey during the 3rd debate than Obama.

(Probabilities rounded to nearest percent.)

aggregate 2012 presidential debate

(Click for larger image.)

74% certain Romney's statements during the debates were more full of malarkey than Obama's.
67% certain Romney was more full of malarkey than Obama during the average debate.

(Probabilities rounded to nearest percent.)

vice presidential debate

92% certain Ryan spewed more malarkey than Biden during the vice presidential debate.

(Probabilities rounded to nearest percent.)

overall 2012 debate comparison

Party comparison of 2012 presidential ticket members' collective and individual average malarkey scores during debates.

88% certain that Republican ticket members' collective statements were more full of malarkey than Democratic ticket members'.
86% certain that the average Republican candidate spewed more malarkey during the average debate than the average Democratic candidate.

(Probabilities rounded to nearest percent.)

observe & report

Below are the observed malarkey scores and comparisons form the malarkey scores of the 2012 presidential candidates.

2012 prez candidates

Truth-O-Meter only (observed)

candidate	malarkey
Obama	44
Biden	48
Romney	55
Ryan	58

The Fact Checker only (observed)

candidate	malarkey
Obama	53
Biden	58
Romney	60
Ryan	47

Averaged over fact checkers

candidate	malarkey
Obama	48
Biden	53
Romney	58
Ryan	52

2012 Red prez vs. Blue prez

Collated bullpucky

ticket	malarkey
Obama/Biden	46
Romney/Ryan	56

Average bullpucky

ticket	malarkey
Obama/Biden	48
Romney/Ryan	58

2012 prez debates

1st presidential debate

opponent	malarkey
Romney	61
Obama	56

2nd presidential debate (town hall)

opponent	malarkey
Romney	31
Obama	33

3rd presidential debate

opponent	malarkey
Romney	57
Obama	46

collated presidential debates

opponent	malarkey
Romney	54
Obama	46

average presidential debate

opponent	malarkey
Romney	61
Obama	56

vice presidential debate

opponent	malarkey
Ryan	68
Biden	44

collated debates overall

ticket	malarkey
Romney/Ryan	57
Obama/Biden	46

average debate overall

ticket	malarkey
Romney/Ryan	61
Obama/Biden	56

the raw deal

You've come this far. Why not just check out the raw data Maslark-O-Meter is using? I promise you: it is as riveting as a phone book.

Nate Silver, election forecasters, and the case for meta-meta-analysis.

Toward meta-meta-analysis of election polls

about

author

what is malarkey?

what is simulated malarkey?

malark-O-glimpse

fuzzy portraits of malarkey

fuzzy portraits of ticket malarkey

Comparisons

2012 prez debates

presidential debates

aggregate 2012 prez debate

2012 vice presidential debate

overall 2012 debate performance

2012 debate self comparisons

individual 2012 presidential debates

aggregate 2012 presidential debate

vice presidential debate

2012 opponent comparisons

individual 2012 presidential debates

aggregate 2012 presidential debate

vice presidential debate

overall 2012 debate comparison

observe & report

2012 prez candidates

Truth-O-Meter only (observed)

The Fact Checker only (observed)

Averaged over fact checkers

2012 Red prez vs. Blue prez

Collated bullpucky

Average bullpucky

2012 prez debates

1st presidential debate

2nd presidential debate (town hall)

3rd presidential debate

collated presidential debates

average presidential debate

vice presidential debate

collated debates overall

average debate overall

the raw deal

archives

malark-O-dex